-
Notifications
You must be signed in to change notification settings - Fork 604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spark on k8s : changes for using namspaces with Kubernetes client APIs. #462
base: master
Are you sure you want to change the base?
Conversation
@@ -138,6 +138,7 @@ object SparkKubernetesApp extends Logging { | |||
private var sessionLeakageCheckInterval: Long = _ | |||
|
|||
var kubernetesClient: DefaultKubernetesClient = _ | |||
var appNamespaces: mutable.Set[String] = mutable.Set("default") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to have a default namespaces at all? Can it be empty initially? Also is it thread safe (or can we make it so)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you consider making configurable allowed namespaces list? We anyway know it upfront when configuring permissions for Live service account. It worths checking how the recovery works, I haven't looked at it for a while, but having Livy monitoring all the namespaces from the get go may save us from leaked/unrecovered apps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we also do not need to propagate app namespace everywhere.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for inputs @jahstreet .
- If user does not specify a namespace, the job goes to default namespace. That's why we start with that. But we can remove it too. As it will anyway get populated after first job submission. For thread safety, i will switch it to java.util.concurrent.ConcurrentHashMap.
- The configurable namespace allowlist would work but that would need deployment everytime a new namespace is allowed. The rolebindings can be created in new namespaces(to allow job submission from livy service account) independently.
- The reason we are maintaining this list of namespaces is for leaked apps. The idea is, if we go through all namespaces where jobs were submitted, that should be enough.
- Livy monitoring all namespaces is not feasible due to security. In a multi tenant cluster, due to security, livy service account might not have any permissions on some namespaces.
- For recovery, we are also saving the namespace now in [server/src/main/scala/org/apache/livy/server/batch/BatchSession.scala] which will be used to recover.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it covers most of my thoughts 👍 .
In case namespaces are dynamic, should we also truncate the set of watched namespaces to avoid revoked access rights to the namespaces and namespace deletion edge cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. The challenge would be to when to truncate. One condition i can think of is, just after deleting all leaked apps. In a rare case this might still leave some leaked apps(apps which were just added to leaked list).I will see what other options we have.
I still need to write some unit tests. But before that i want to make sure existing tests pass. How do i get this workflow approved from maintainer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gyogal can u help here please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ashokkumarrathore for the PR and many thanks also to @askhatri, @jahstreet for the feedback! The unit test executor is currently broken and I am in the process of fixing it. As soon as it is working again, I will start the check jobs on this PR.
What changes were proposed in this pull request?
Related issue : #461
How was this patch tested?
Tested by deploying updated server in a k8s cluster with namespaces and running a job.